Data Visualization by Andy Kirk

Data Visualization by Andy Kirk

Author:Andy Kirk
Language: eng
Format: epub
Publisher: Packt Publishing


Twitter provided us with the continuation tokens that we could pass back to Twitter to request the next page of data. Stack Overflow takes a different approach and assigns page numbers, allowing us to browse through the results with ease. Embedded in the response for every API call is a token called has_more, which is true whenever there are more pages of data that match the current query.

In this code, we make use of the continuation token and the page number to perform as many queries as necessary to retrieve all the answers. We are making use of the jQuery function ajax, instead of the more common getJson function, because we would like to retrieve the data synchronously. We do this because we want the entire dataset at one time. If your visualization allows for data to be added dynamically then you can relax the async:false requirement.

What's returned is an array of objects, each one of which represents an answer to a question. If we give the retrieveQuestionAnswers method an ID such as 901115, then we'll get back an array of 50 answers. These come back over the course of two requests and the code above merges them together into the results array which is returned.

Each Answer contains a number of fields. A list of the fields returned by default can be found at https://api.stackexchange.com/docs/types/answer. For the purpose of our visualization, we're most interested in when the answer was originally suggested, its score, and also whether it was chosen as the accepted answer. These bits of information can be found in the fields: creation_date, score, and is_accepted. We'll ignore the rest of the fields for now.

Now that we have some basic data, we can start thinking about the visualization. We're trying to convey the relationship between the age of a question and its score. This sounds a lot like a use for a scatter plot. The data points stand on their own and can be placed along two axis, date and points. My theory before starting on this that answers that are older will tend to have a higher score, because they've been around longer to gather points. People are programmed to believe that numbers going up are positive, so let's play to that and plot points versus age which will, if my theory holds, have higher values on the right.

Of course, a scatter plot is boring and nothing we couldn't generate outside of Excel. We'll add some interactivity to it, but to start, we'll still need a simple scatter plot.

This is easily done with a couple of scales and some circles, as shown in the following code:

var graph = d3.select("#graph"); var axisWidth = 50; var graphWidth = graph.attr("width"); var graphHeight = graph.attr("height"); var xScale = d3.scale.linear() .domain([0, d3.max(data, function(item){ return item.age;})]) .range([axisWidth,graphWidth-axisWidth]); var yScale = d3.scale.log() .domain([d3.max(data, function(item){return item.score;}),1]) .range([axisWidth,graphHeight-axisWidth]);



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.